Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DeltaTriplesManager #1603

Merged
merged 14 commits into from
Nov 8, 2024
Merged

Conversation

joka921
Copy link
Member

@joka921 joka921 commented Nov 6, 2024

The already existing DeltaTriples class maintains a dynamically changing set of insertions and deletions relative to the original input data, together with a (single) local vocab. The class is not threadsafe and has to be used with care. In particular, concurrent update queries have to be serialized, and while a query makes use of the "delta triples", it has to be made sure that they are not changed over the course of the processing of that query.

Both of these problems are solved by the new DeltaTriplesManager class. The index has a single object of this class. It maintains a single DeltaTriples object, write access to which is strictly serialized. Each new query gets a so-called snapshot of the current delta triples. This is a full copy (of the delta triples located in each of the permutations and of the local vocab). These snapshots are read-only and multiple queries can share the same snapshot. A snapshot lives as long as one query using it is still being processed.

Signed-off-by: Johannes Kalmbach <[email protected]>
TODO:
Write comments etc.

Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Copy link

codecov bot commented Nov 6, 2024

Codecov Report

Attention: Patch coverage is 89.06250% with 14 lines in your changes missing coverage. Please review.

Project coverage is 89.16%. Comparing base (bb70c4a) to head (d70705b).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
src/index/Index.cpp 68.75% 5 Missing ⚠️
src/engine/CountAvailablePredicates.cpp 0.00% 4 Missing ⚠️
src/index/IndexImpl.h 25.00% 3 Missing ⚠️
src/index/IndexImpl.cpp 84.61% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1603      +/-   ##
==========================================
- Coverage   89.17%   89.16%   -0.01%     
==========================================
  Files         372      372              
  Lines       34579    34635      +56     
  Branches     3912     3912              
==========================================
+ Hits        30835    30883      +48     
- Misses       2471     2479       +8     
  Partials     1273     1273              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Qup42
Copy link
Member

Qup42 commented Nov 7, 2024

Some SPARQL operations may result in Quad insertions and deletions. Is it possible that operations acquire the lock in between these two parts of a single update?

Copy link
Member

@hannahbast hannahbast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1-1 with Johannes, round 1

We discussed some changes.

This is awesome!

Signed-off-by: Johannes Kalmbach <[email protected]>
# Conflicts:
#	test/DeltaTriplesTest.cpp
Signed-off-by: Johannes Kalmbach <[email protected]>
@hannahbast hannahbast changed the title Threadsafe and consistent delta triples Add DeltaTripelsManager Nov 7, 2024
Copy link
Member

@hannahbast hannahbast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great now. I have committed a few more minor changes.

I have two questions, which I marked as TODO(Hannah), please explain.

@joka921 joka921 changed the title Add DeltaTripelsManager Add DeltaTriplesManager Nov 8, 2024
@sparql-conformance
Copy link

Copy link

sonarqubecloud bot commented Nov 8, 2024

Copy link
Member

@hannahbast hannahbast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the answers and the extended comments, this looks great now.

One more question: CodeCov complains and some of the untested lines look testable. What do you think, should we still address (some of) them in this PR?

@joka921
Copy link
Member Author

joka921 commented Nov 8, 2024

@hannahbast I have also looked at the lines, but they are either completely unrelated to this PR, or they will be adressed anyway in one of the follow-up PRs.

@joka921 joka921 merged commit 1ddf5e0 into ad-freiburg:master Nov 8, 2024
21 of 22 checks passed
@joka921 joka921 deleted the Copy-delta-triples branch November 8, 2024 13:21
hannahbast added a commit that referenced this pull request Nov 10, 2024
Since #1355, the query `LIMIT` is clamped to the value of the QLever-specific `send` parameter. In particular, this broke the display of the result size in the QLever UI, which sets `send=100` when showing the first page of results for a query.

This change restores the old behavior for the `send` parameter, yet makes good use of the new lazy query processing. Specifically, lazily computed results blocks are now processed as follows: (1) the first result blocks before OFFSET are computed but skipped; (2) then results are computed and materialized until the value of the `send` parameter is reached; (3) then results are computed and counted but *not* materialized until the LIMIT is reached; (3) all remaining blocks are not even computed.

The QLever JSON now has two new fields `resultSizeTotal` and `resultSizeExported`, with the corresponding values. For the sake of backwards-compatibility, the old `resultsize` field is still there and has the same value as the `resultSizeTotal` field. For the same reason, the `send` parameter keeps its name for now, but should be renamed to `exportLimit` eventually.

On the side fixed, dropped the hard limit of `MAX_NOF_ROWS_IN_RESULT = 1'000'000` for JSON results. Also fix the compilation error introduced by the interplay of the merge of #1603 and #1607 . Fixes #1605 and #1455 .
ullingerc pushed a commit to ullingerc/qlever that referenced this pull request Nov 12, 2024
The already existing `DeltaTriples` class maintains a dynamically changing set of insertions and deletions relative to the original input data, together with a (single) local vocab. The class is not threadsafe and has to be used with care. In particular, concurrent update queries have to be serialized, and while a query makes use of the "delta triples", it has to be made sure that they are not changed over the course of the processing of that query.

Both of these problems are solved by the new `DeltaTriplesManager` class. The index has a single object of this class. It maintains a single `DeltaTriples` object, write access to which is strictly serialized. Each new query gets a so-called *snapshot* of the current delta triples. This is a full copy (of the delta triples located in each of the permutations and of the local vocab). These snapshots are read-only and multiple queries can share the same snapshot. A snapshot lives as long as one query using it is still being processed.
ullingerc pushed a commit to ullingerc/qlever that referenced this pull request Nov 12, 2024
Since ad-freiburg#1355, the query `LIMIT` is clamped to the value of the QLever-specific `send` parameter. In particular, this broke the display of the result size in the QLever UI, which sets `send=100` when showing the first page of results for a query.

This change restores the old behavior for the `send` parameter, yet makes good use of the new lazy query processing. Specifically, lazily computed results blocks are now processed as follows: (1) the first result blocks before OFFSET are computed but skipped; (2) then results are computed and materialized until the value of the `send` parameter is reached; (3) then results are computed and counted but *not* materialized until the LIMIT is reached; (3) all remaining blocks are not even computed.

The QLever JSON now has two new fields `resultSizeTotal` and `resultSizeExported`, with the corresponding values. For the sake of backwards-compatibility, the old `resultsize` field is still there and has the same value as the `resultSizeTotal` field. For the same reason, the `send` parameter keeps its name for now, but should be renamed to `exportLimit` eventually.

On the side fixed, dropped the hard limit of `MAX_NOF_ROWS_IN_RESULT = 1'000'000` for JSON results. Also fix the compilation error introduced by the interplay of the merge of ad-freiburg#1603 and ad-freiburg#1607 . Fixes ad-freiburg#1605 and ad-freiburg#1455 .
hannahbast pushed a commit that referenced this pull request Nov 14, 2024
PR #1582 and #1603 gave all index-lookup methods access to a snapshot of the (located) delta triples. With this change, these triples are now merged with the original triples during query processing whenever necessary. When an index block does not contain any located triples, the performance for accessing that block is the same as before.

The methods for obtaining the result size of an index scan now have two versions: one for obtaining an approximate size (this is cheap because it can be computed from the metadata of the blocks and the located triples) and one for obtaining the exact size (if there are located triples this is expensive because it requires reading and decompressing a block and merging the located triples).
realHannes pushed a commit to realHannes/qlever that referenced this pull request Nov 15, 2024
PR ad-freiburg#1582 and ad-freiburg#1603 gave all index-lookup methods access to a snapshot of the (located) delta triples. With this change, these triples are now merged with the original triples during query processing whenever necessary. When an index block does not contain any located triples, the performance for accessing that block is the same as before.

The methods for obtaining the result size of an index scan now have two versions: one for obtaining an approximate size (this is cheap because it can be computed from the metadata of the blocks and the located triples) and one for obtaining the exact size (if there are located triples this is expensive because it requires reading and decompressing a block and merging the located triples).
joka921 pushed a commit that referenced this pull request Nov 18, 2024
Add the missing code in `Server.cpp` and `Server.h` to process a parsed update request using the `DeltaTriplesManager` from #1603 .
As of this commit, QLever has an initial beta support for a subset of SPARQL UPDATE with the following limitations (all of which will be added in the future, and the list is far from being complete):
1. Updates are not yet persistent, so a restart or crash of the QLever server will delete all updates.
2. Only a single update request per query is allowed, not the syntax that atomically chains an arbitrary sequence of updates.
3. Only INSERT and DELETE queries are supported, the support for queries that drop or add a complete graph will be added in the future.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants